Beyond the Nystrom Approximation: Speeding up Spectral Clustering using Uniform Sampling and Weighted Kernel k-means

نویسندگان

  • Mahesh Mohan
  • Claire Monteleoni
چکیده

In this paper we present a framework for spectral clustering based on the following simple scheme: sample a subset of the input points, compute the clusters for the sampled subset using weighted kernel k-means (Dhillon et al. 2004) and use the resulting centers to compute a clustering for the remaining data points. For the case where the points are sampled uniformly at random without replacement, we show that the number of samples required depends mainly on the number of clusters and the diameter of the set of points in the kernel space. Experiments show that the proposed framework outperforms the approaches based on the Nyström approximation both in terms of accuracy and computation time.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Scalable Kernel K-Means Clustering with Nystrom Approximation: Relative-Error Bounds

Kernel k-means clustering can correctly identify and extract a far more varied collection of cluster structures than the linear k-means clustering algorithm. However, kernel kmeans clustering is computationally expensive when the non-linear feature map is highdimensional and there are many input points. Kernel approximation, e.g., the Nyström method, has been applied in previous works to approx...

متن کامل

A Unified View of Kernel k-means, Spectral Clustering and Graph Cuts

Recently, a variety of clustering algorithms have been proposed to handle data that is not linearly separable. Spectral clustering and kernel k -means are two such methods that are seemingly quite different. In this paper, we show that a general weighted kernel k -means objective is mathematically equivalent to a weighted graph partitioning objective. Special cases of this graph partitioning ob...

متن کامل

Link-based Community Detection with the Commute-Time Kernel

The main purpose of this work is to find communities in a weighted, undirected, graph by using kernel-based clustering methods, directly partitioning the graph according to a well-defined similarity measure between the nodes (a kernel on a graph). The algorithm is based on a two-step procedure. First, the sigmoid commute-time kernel (KCT), providing a meaningful similarity measure between any c...

متن کامل

Weighted Ensemble Clustering for Increasing the Accuracy of the Final Clustering

Clustering algorithms are highly dependent on different factors such as the number of clusters, the specific clustering algorithm, and the used distance measure. Inspired from ensemble classification, one approach to reduce the effect of these factors on the final clustering is ensemble clustering. Since weighting the base classifiers has been a successful idea in ensemble classification, in th...

متن کامل

Distributed Adaptive Sampling for Kernel Matrix Approximation

Most kernel-based methods, such as kernel or Gaussian process regression, kernel PCA, ICA, or k-means clustering, do not scale to large datasets, because constructing and storing the kernel matrix Kn requires at least O(n2) time and space for n samples. Recent works [1, 9] show that sampling points with replacement according to their ridge leverage scores (RLS) generates small dictionaries of r...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017